Character Segmentation Scheme for OCR System : For Myanmar Printed
نویسندگان
چکیده
Automatic machine-printed Optical Characters or texts Recognizers (OCR) are highly desirable for a multitude of modern IT applications, including Digital Library software. However, the state of the art OCR systems cannot do for Myanmar scripts as the language poses many challenges for document understanding. Therefore, the authors design an Optical Character Recognition System for Myanmar Printed Document (OCRMPD), with several proposed techniques that can automatically recognize Myanmar printed text from document images. In order to get more accurate system, the authors propose the method for isolation of the character image by using not only the projection methods but also structural analysis for wrongly segmented characters. To reveal the effectiveness of the segmentation technique, the authors follow a new hybrid feature extraction method and choose the SVM classifier for recognition of the character image. The proposed algorithms have been tested on a variety of Myanmar printed documents and the results of the experiments indicate that the methods can increase the segmentation accuracy as well as recognition rates. DOI: 10.4018/978-1-4666-3906-5.ch018
منابع مشابه
Character Segmentation Scheme for OCR System: For Myanmar Printed Documents
Automatic machine-printed Optical Characters or texts Recognizers (OCR) are highly desirable for a multitude of modern IT applications, including Digital Library software. However, the state of the art OCR systems cannot do for Myanmar scripts as the language poses many challenges for document understanding. Therefore, the authors design an Optical Character Recognition System for Myanmar Print...
متن کاملBilingual OCR System for Myanmar and English Scripts with Simultaneous Recognition
Htwe Pa Pa Win, Phyo Thu Thu Khine, Khin Nwe Ni Tun AbstractThe increasing amount of development of the digital libraries worldwide raises many new challenges for document image analysis research and development. Storing wide variety of document images in Digital library, for example, for cultural, technical or historical, that are written in many languages, also create many advancement for pre...
متن کاملOCR for printed Kannada text to Machine editable format using Database approach
This paper describes an Optical Character Recognition (OCR) system for printed text documents in Kannada, a South Indian language. The proposed OCR system for the recognition of printed Kannada text, which can handle all types of Kannada characters. The system first extracts image of Kannada scripts, then from the image to line segmentation then segments the words into sub-character level piece...
متن کاملA Structural Analysis Based Feature Extraction Method for OCR System For Myanmar Printed Document Images
This paper proposes a new feature extraction method for off-line recognition of Myanmar printed documents. One of the most important factors to achieve high recognition performance in Optical Character Recognition (OCR) system is the selection of the feature extraction methods. Different types of existing OCR systems used various feature extraction methods because of the diversity of the script...
متن کاملA Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling
In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016